Production of NLP-oriented Bilingual Language Resources from Human-oriented dictionaries

نویسندگان

  • Vera Fluhr-Semenova
  • Christian Fluhr
  • Stéphanie Brisson
چکیده

In this paper, the main features of manually produced bilingual dictionaries, which have been originally designed for human use, are considered. The problem is to find the way to use such kind of dictionaries in order to produce bilingual language resources that could make a base for automate text processing, such as machine translation, cross-lingual interrogation in text retrieval, etc. The transformation technology suggested hereby is based on XML-parsing of the file obtained from the source data by means of serial of special procedures. In order to produce well-formed XML-file, automatic procedures suffice. But in most cases, there are still semantic problems and inconveniencies that could be retired only in interactive way. However, the volume of this work can be minimized due to automatic pre-editing and suitable XML mark-up. The paper presents the results of R&D project which was carried out in the framework of ELRA’1999 Call for proposals on Language resources Production. The paper is based on the authors’ experience with English-Russian and French-Russian dictionaries, but the technology can be applied to other pairs of languages.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Investigation into Bilingual Dictionary Use: Do the Frequency of Use and Type of Dictionary Make a Difference in L2 Writing Performance?

Bilingual dictionary use in L2 writing test performance has recently been the subject of debate. Opinions differ according to how the trait is understood and whether the system favors the process-oriented or product-oriented views towards the assessment and writing skill. Given the need for more empirical support, this study is aimed at investigating the availability of bilingual dictionary use...

متن کامل

Leveraging Terminological Data for Use in Conjunction with Lexicographical Resources

Integration is the operative term in today’s localization (L10N) and translation environments. Translation departments and companies, as well as some individual translators, who have already grown accustomed to using workbench systems that combine translation memory (TM) with terminology management systems (TMS) are rapidly moving on to include an array of L10N utilities and machine translation...

متن کامل

Building a Basque-Chinese Dictionary by Using English as Pivot

Bilingual dictionaries are key resources in several fields such as translation, language learning or various NLP tasks. However, only major languages have such resources. Automatically built dictionaries by using pivot languages could be a useful resource in these circumstances. Pivot-based bilingual dictionary building is based on merging two bilingual dictionaries which share a common languag...

متن کامل

Generation of Bilingual Dictionaries using Comparable and Quasi Comparable Corpora

The amount of information available on the web is increasing rapidly. The number of internet users is also increasing every day. A significant section of internet users is monolingual. They want to express themselves in their native language and also seeking information in the same. Hence, multilingual content over the internet is also increasing at a rapid pace. There is a need of systems whic...

متن کامل

Constrained Hidden Markov Model for Bilingual Keyword Pairs Alignment

Bilingual terminology dictionaries are resources of much practical importance in many application of bilingual NLP. Because technical terminology can be both very specific and rapidly evolving, it can however be difficult to obtain dictionaries with good coverage. Mining automatically such terminology from technical documents is therefore an attractive possibility. With this goal in mind, and f...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000